Analysis Walkthrough Example

Author

Riya Sharma, based on a lecture by Aaron Kessler

Getting Started

This walkthrough will cover how you can visualize data in map form! This includes Census data through the tidycensus package. We’ll go through customization as well.

To begin, we must load our packages.

Choosing Variables

We’ll be using the tidycensus package to pull both census data, as well as geospatial boundaries.

In order to access the data used in this walkthrough, we’ll need a Census API. You can learn how to install and use one here.

Now, we choose variables we want to use from the American Community Survey, conducted by the US Census Bureau. There are many to choose from, and we can look at them by using the load_variables() function.

I assigned it to a variable called acs. Since there are lots of variables, it’s helpful to view the entire acs dataframe and see the descriptions. We will pull total population (assigned to the totalpop variable), median household income (assigned to medincome), and medage (median age).

The c() function creates a vector with these variable names, and we are assigning it to the myvars variable.

Code
#chose variables we want
myvars <- c(totalpop = "B01003_001",
            medincome = "B19013_001",
            medage = "B01002_001"
)

Creating a New Dataframe

Now, we pull the information for GA counties. To do so, we use the get_acs() function. The arguments are as follows:

  • geography = “county”: we pull data for each county
  • variables = c(myvars): we use the variables we pulled previously (medincome, totalpop, medage) in our dataframe
  • state = “GA”: We are pulling state data for GA
  • output = “wide”: This makes data easier to read by pivoting wide
  • geometry = TRUE: This includes all shapefile data necessary to make a map

We’re assigning this to ga_counties_withgeo

Code
#pull for GA counties
ga_counties_withgeo <- get_acs(geography = "county",
                       variables = c(myvars),
                       state = "GA",
                       output = "wide",
                       geometry = TRUE)
Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
Code
ga_counties_withgeo
Simple feature collection with 159 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -85.60516 ymin: 30.35785 xmax: -80.84038 ymax: 35.00124
Geodetic CRS:  NAD83
First 10 features:
   GEOID                      NAME totalpopE totalpopM medincomeE medincomeM
1  13021      Bibb County, Georgia    156711        NA      43862       1778
2  13049  Charlton County, Georgia     12416        NA      45494       5791
3  13283  Treutlen County, Georgia      6410        NA      35441       9710
4  13309   Wheeler County, Georgia      7568        NA      26776       3605
5  13279    Toombs County, Georgia     26956        NA      42975       3095
6  13077    Coweta County, Georgia    144928        NA      83486       2974
7  13153   Houston County, Georgia    161177        NA      70313       3057
8  13183      Long County, Georgia     16398        NA      52742       8858
9  13163 Jefferson County, Georgia     15708        NA      42238       4150
10 13261    Sumter County, Georgia     29690        NA      36687       2163
   medageE medageM                       geometry
1     36.2     0.3 MULTIPOLYGON (((-83.89192 3...
2     40.6     1.5 MULTIPOLYGON (((-82.4156 31...
3     39.9     5.3 MULTIPOLYGON (((-82.74762 3...
4     33.6    10.0 MULTIPOLYGON (((-82.93976 3...
5     37.8     0.9 MULTIPOLYGON (((-82.48038 3...
6     38.9     0.3 MULTIPOLYGON (((-85.0132 33...
7     35.9     0.3 MULTIPOLYGON (((-83.85685 3...
8     33.7     0.8 MULTIPOLYGON (((-81.98162 3...
9     40.5     0.8 MULTIPOLYGON (((-82.66192 3...
10    37.0     1.1 MULTIPOLYGON (((-84.44381 3...

We can also get all counties in the US, but be mindful that this would be a bit difficult to visualize on a map.

Code
#all counties in the US
all_counties_withgeo <- get_acs(geography = "county",
                       variables = c(myvars),
                       output = "wide",
                       geometry = TRUE)
Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Code
head(all_counties_withgeo)
Simple feature collection with 6 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -111.6345 ymin: 39.04326 xmax: -91.52874 ymax: 45.63864
Geodetic CRS:  NAD83
  GEOID                      NAME totalpopE totalpopM medincomeE medincomeM
1 20161      Riley County, Kansas     72602        NA      53296       2489
2 19159     Ringgold County, Iowa      4739        NA      57700       5058
3 30009    Carbon County, Montana     10488        NA      63178       4261
4 16007   Bear Lake County, Idaho      6327        NA      60337       7039
5 55011 Buffalo County, Wisconsin     13314        NA      61167       2352
6 31185     York County, Nebraska     14164        NA      66337       4128
  medageE medageM                       geometry
1    25.5     0.1 MULTIPOLYGON (((-96.96095 3...
2    44.3     1.0 MULTIPOLYGON (((-94.47167 4...
3    50.7     0.9 MULTIPOLYGON (((-109.7987 4...
4    38.9     1.1 MULTIPOLYGON (((-111.6345 4...
5    46.5     0.5 MULTIPOLYGON (((-92.08384 4...
6    39.5     1.2 MULTIPOLYGON (((-97.82629 4...

As you can see in the results above, there are E and M columns. The ones ending in “M” are margin of error columns, which we do not need for this analysis. So, we shall remove the column with the select() function. The - symbol cuts columns, and the ends_with() function identifies those ending in “M”.

Code
#remove MOE columns - they all end with "M"
ga_counties_withgeo <- ga_counties_withgeo %>%
  select(-ends_with("M"))

ga_counties_withgeo
Simple feature collection with 159 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -85.60516 ymin: 30.35785 xmax: -80.84038 ymax: 35.00124
Geodetic CRS:  NAD83
First 10 features:
   GEOID                      NAME totalpopE medincomeE medageE
1  13021      Bibb County, Georgia    156711      43862    36.2
2  13049  Charlton County, Georgia     12416      45494    40.6
3  13283  Treutlen County, Georgia      6410      35441    39.9
4  13309   Wheeler County, Georgia      7568      26776    33.6
5  13279    Toombs County, Georgia     26956      42975    37.8
6  13077    Coweta County, Georgia    144928      83486    38.9
7  13153   Houston County, Georgia    161177      70313    35.9
8  13183      Long County, Georgia     16398      52742    33.7
9  13163 Jefferson County, Georgia     15708      42238    40.5
10 13261    Sumter County, Georgia     29690      36687    37.0
                         geometry
1  MULTIPOLYGON (((-83.89192 3...
2  MULTIPOLYGON (((-82.4156 31...
3  MULTIPOLYGON (((-82.74762 3...
4  MULTIPOLYGON (((-82.93976 3...
5  MULTIPOLYGON (((-82.48038 3...
6  MULTIPOLYGON (((-85.0132 33...
7  MULTIPOLYGON (((-83.85685 3...
8  MULTIPOLYGON (((-81.98162 3...
9  MULTIPOLYGON (((-82.66192 3...
10 MULTIPOLYGON (((-84.44381 3...

…we’ll also remove that trailing “E” from the estimate columns, which we will use for analysis. The sub function allows us to do so. E$ means the E at the end of the variable will be removed.

Code
#remove that trailing "E"
colnames(ga_counties_withgeo) <- sub("E$", "", colnames(ga_counties_withgeo)) # $ means end of string only

ga_counties_withgeo
Simple feature collection with 159 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -85.60516 ymin: 30.35785 xmax: -80.84038 ymax: 35.00124
Geodetic CRS:  NAD83
First 10 features:
   GEOID                       NAM totalpop medincome medage
1  13021      Bibb County, Georgia   156711     43862   36.2
2  13049  Charlton County, Georgia    12416     45494   40.6
3  13283  Treutlen County, Georgia     6410     35441   39.9
4  13309   Wheeler County, Georgia     7568     26776   33.6
5  13279    Toombs County, Georgia    26956     42975   37.8
6  13077    Coweta County, Georgia   144928     83486   38.9
7  13153   Houston County, Georgia   161177     70313   35.9
8  13183      Long County, Georgia    16398     52742   33.7
9  13163 Jefferson County, Georgia    15708     42238   40.5
10 13261    Sumter County, Georgia    29690     36687   37.0
                         geometry
1  MULTIPOLYGON (((-83.89192 3...
2  MULTIPOLYGON (((-82.4156 31...
3  MULTIPOLYGON (((-82.74762 3...
4  MULTIPOLYGON (((-82.93976 3...
5  MULTIPOLYGON (((-82.48038 3...
6  MULTIPOLYGON (((-85.0132 33...
7  MULTIPOLYGON (((-83.85685 3...
8  MULTIPOLYGON (((-81.98162 3...
9  MULTIPOLYGON (((-82.66192 3...
10 MULTIPOLYGON (((-84.44381 3...

Mapping GA counties with Mapview

Our first simple maps use mapview(). It takes our dataframe (ga_counties_withgeo) and variables (zcol) as arguments. As you can see, the first map shows median income and the second shows median age in each GA county.

Code
mapview(ga_counties_withgeo, zcol = "medincome")
Code
mapview(ga_counties_withgeo, zcol = "medage")

Customizing

To jazz things up a bit, let’s change from the default theme. We do so by adding an argument called col.regions. This utilizes the RColorBrewer package, which houses different dicrete and continuous palettes. We are using the “Greens” palette. Below is a map showing median income with a different palette.

Code
mapview(ga_counties_withgeo, zcol = "medincome", 
         col.regions = RColorBrewer::brewer.pal(9, "Greens"), 
         alpha.regions = 1)
Warning: Found less unique colors (9) than unique zcol values (159)! 
Interpolating color vector to match number of zcol values.

This map’s dark background appeared automatically, because mapview determined the map included a lot of light colors. You can turn off that feature with the following code. It makes things easier to understand.

Code
mapviewOptions("basemaps.color.shuffle" = FALSE)

Here’s a new map with the light background.

Code
mapview(ga_counties_withgeo, zcol = "medincome", 
         col.regions = RColorBrewer::brewer.pal(9, "Greens"), 
         alpha.regions = 1)
Warning: Found less unique colors (9) than unique zcol values (159)! 
Interpolating color vector to match number of zcol values.

We can also compare two maps at the same time! You’ll need to assign the code used to create the map to do this. map_income is our map of median household income in GA counties, while map_age is our map of median age in GA counties.

Code
map_income <- mapview(ga_counties_withgeo, zcol = "medincome", 
         col.regions = RColorBrewer::brewer.pal(9, "Greens"), 
         alpha.regions = 1)
Warning: Found less unique colors (9) than unique zcol values (159)! 
Interpolating color vector to match number of zcol values.
Code
map_age <- mapview(ga_counties_withgeo, zcol = "medage", 
         col.regions = RColorBrewer::brewer.pal(9, "Greens"), 
         alpha.regions = 1)
Warning: Found less unique colors (9) than unique zcol values (97)! 
Interpolating color vector to match number of zcol values.

The sync() function shows two maps together, like so:

Code
# two maps together
sync(map_income, map_age)

We may also include a side-by-side slider by separating the map variables with the “|” symbol. This is from the leaflet.extras2 package.

Code
map_income | map_age

Finally, we can also turn off legends for a cleaner apperance. Make sure your map is interpretable without a legend, however. You want to include an accessible visualization in your projects!

Code
mapview(ga_counties_withgeo, zcol = "medincome", 
         col.regions = RColorBrewer::brewer.pal(9, "Greens"), 
         alpha.regions = 1,
         legend = FALSE)
Warning: Found less unique colors (9) than unique zcol values (159)! 
Interpolating color vector to match number of zcol values.